Beijing’s central role in global artificial intelligence research

Nations worldwide are mobilizing to harness the power of Artificial Intelligence (AI) given its massive potential to shape global competitiveness over the coming decades. Using a dataset of 2.2 million AI papers, we study inter-city citations, collaborations, and talent migrations to uncover dependencies between Eastern and Western cities worldwide. Beijing emerges as a clear outlier, as it has been the most impactful city since 2007, the most productive since 2002, and the one housing the largest number of AI scientists since 1995. Our analysis also reveals that Western cities cite each other far more frequently than expected by chance, East–East collaborations are far more common than East–West or West–West collaborations, and migration of AI scientists mostly takes place from one Eastern city to another. We then propose a measure that quantifies each city’s role in bridging East and West. Beijing’s role surpasses that of all other cities combined, making it the central gateway through which knowledge and talent flow from one side to the other. We also track the center of mass of AI research by weighing each city’s geographic location by its impact, productivity, and AI workforce. The center of mass has moved thousands of kilometers eastward over the past three decades, with Beijing’s pull increasing each year. These findings highlight the eastward shift in the tides of global AI research, and the growing role of the Chinese capital as a hub connecting researchers across the globe.

where µ ij is an error term statistically independent of n i and m j with E[µ ij |n i , m j ] = 1, while α and β capture the relative importance of the productivity of i and the number of papers cited by j, respectively. The expected impact, conditional on n i and m j , is equal to E[m ij |n i , m j ] = n −1 n α i m β j . We estimate α and β to determine the expected impact empirically. In particular, if α = β = 1 then the expected impact described by Equation (1) coincides with the random citation baseline model. We estimate Equation (1) using standard techniques developed in the international trade literature. We follow the approach in [4] and compute the Poisson pseudo-maximum likelihood estimator (PPML) to calculate the expected impact. Poisson estimators are flexible enough to estimate Equation (1) directly, without the need to linearize it, thereby retaining city pairs for which the citation values are zero. Furthermore, Poisson estimators do not require the homoskedasticity assumption. Supplementary Table 1 presents the estimation of the exponents in Equation (1). Each column shows the results for a year between 2013 and 2017. As can be seen, although we reject the null hypothesis thatα = 1 at conventional levels in each specification, the estimated coefficientsα andβ are very close to 1 across columns. In year 2017 for example, using N = 6, 119, 444 city-pairs yields a coefficientα = 1.145 (s.e. 0.029) andβ = 1.000 (s.e. 0.017); a constant coefficient −12.509 (s.e. 0.110), which implies a baseline citation between i and j of e −12.509 ≈ 0 and a pseudo-R 2 of 0.568. The precision with which the coefficients are estimated suggests that the random citation model provides a good approximation on average, but there may be other variables that might capture part of the unexplained variation. In any case, these empirical results align with the random citation model, albeit not perfectly.
We compute the difference between actual impact m ij and expected impactm ij = n −1 nα i mβ j for each year from 2013 to 2017 using the estimates in Supplementary Table 1. In Figure 2b, as well Supplementary Figure 7, each cell corresponds to the sum of the difference (m ij −m ij ) for each city-pair (i, j) across the five years from 2013 to 2017. Intuitively, this represents the difference between actual and expected impact during those five years.

Supplementary
where the additional factor, F ij = k (f k ij ) θ k , is a function of the frictions defined. The goal is to estimate α, β and the set of parameters θ k . The error term µ ij makes the relationship hold with equality and follows the same distributional assumptions described in the previous section. Note that Equation (1) is a particular case of Equation (2) when θ k = 0 for all k.
We follow [4] and estimate Equation (2) using a Pseudo-Maximum Likelihood (PML) estimator that accounts for zero citations. We divide the additional variables in three groups. The first group comprises total scientists at source and at destination. The second group consists of a series of indicator variables that account for distance, including the same-city indicator. We compute geodesic distance between two cities using their coordinates, downloaded from the Google Geocoding API. The last group includes cultural and geographic variables. We use the CEPII variables [6] widely utilized in the trade literature. These variables are indicators of whether two city's countries: are contiguous, share official common language, shared common language is English, ever had a colonial relationship, colonial relationship extends after 1945, and share a common colonizer. We include two additional geographic variables that measure remoteness of both source and destination. Remoteness is simply the weighted average of the distance to all other cities in the world, with weights being the share of AI scientists of those cities over the total number of AI scientists in the world. Supplementary Table 2 shows the Poisson PML estimates using different specifications. Impact is positively and significantly associated with productivity, regardless of the model and specification. Column (1) shows the basic frictionless model estimation for clarity. This result coincides with column (5) in Supplementary Table 1. Each of the remaining columns adds frictions as controls sequentially. Column (2) includes number of AI scientists. The coefficients accompanying both, productivity and destination total citations, remain qualitatively the same as in the frictionless model (1). However, the estimates for the parameters of both source and destination's total AI scientists are small in magnitude with relatively large standard errors. Column (3) adds distance. Again, the parameters accompanying productivity and destination total citations remain about the same and precisely estimated. Regarding distance, we observe evidence of home-bias in that citations are positively associated with being in the same city. But closeness also seems to matter. Cities within a radius of about 6,000 km are associated with additional citations. The last column adds geographic and cultural controls. The coefficients in the frictionless model remain very close to one, although we reject equality at conventional levels. Regarding controls, same country and common language are the ones that stand out as positive. Colonial ties after 1945, however, are negatively associated with citations. Remoteness of the source city, that is, how far the city is relative to all other cities, is positively associated. This may indicate that some small centers such as Redmond in Washington State, U.S. elicit a large number of citations.
The main conclusion from the gravity model with frictions is the robustness of the frictionless model. That is, size of the source in terms of productivity and of the destination in terms of total citations made, provide fairly good explanations of the expected number of citations a city receives through a frictionless gravity model. Nevertheless, frictions explain some of the unaccounted variance. Geographic and cultural closeness, for example, are also positively associated with impact. Citations to source city j's papers from city i (Impact of j on i) Standard errors are in parentheses to the right of each coefficient. Stars denote significance at conventional levels ( * p-value<0.05, * * p-value<0.01, * * * p-value<0.001)

Supplementary Note 3: Gravity model of migration with frictions
Drawing on the previous analysis and the rich literature on gravity models, we can also estimate a model of migration that accounts for size and frictions. Equation 2 can be relabeled to explain migration of AI scientists from one city to another. Now, let m ij denote migration from city j to city i, n i AI scientists supplied to i from all origins, m j total AI scientists in j, and n total AI scientists in the world. Using n i allows us to account for whether migration decisions are associated with the number of scientists at destination. Such decisions might be motivated by agglomeration economies or knowledge spillovers at destination that may enhance the migrant scientist's productivity. The inclusion of m j accounts for the rather mechanical effect that larger cities should source more scientists to the rest of the world. The migration model is a straightforward relabelling of the citation model above. But note that the number of pairwise observations for scientist migration is smaller than for citations. That is due to the fact that one scientist can write a large number of papers but migrate very few times. Another clarification is that we use productivity of i and total citations made by j as controls to account for production size frictions. Finally, we measure migration in five-year intervals as people's mobility is slower than knowledge diffusion. Thus, we focus on the period 2015 to 2019. Supplementary Table 3 shows the results of the estimation. Column (1) features the frictionless model. Total AI scientists supplied to i is positively associated with migration to city i, although the value is far from 1. AI labor size at source also seems to matter but the coefficient is estimated less precisely. When adding productivity and impact controls in column (2) the coefficient on total AI scientists supplied to city i doubles and remains significant at conventional levels. However, adding the productivity and impact controls does little to the correlation between AI labor at source and migration. Column (3) and (4) add distance and geographic-cultural controls sequentially. In both specifications, the coefficients on AI labor supplied to city i and AI labor in city j are positive (about 0.5) and precisely estimated. These results suggest that AI scientists migrate more frequently to cities that attract other AI scientists from all over the world. They also suggest that the number of AI scientist in a city is associated with higher chances of finding an AI scientist from that city everywhere else. The fact that frictions make the coefficient on AI labor larger and more precise suggest that frictions matter for migration, perhaps more than for citations. In this section, we quantify the importance of each city in the citation network, the collaboration network, and the migration network using various measures borrowed from the social network analysis toolkit. To this end, for any given node v i ∈ V , let P (v i ) denote the set of all predecessors of v i (i.e., all nodes that have edges to v i ), let S(v i ) denote the set of all successors of v i (i.e., all nodes with edges from v i ), let w ij denote the weight of the edge from v i to v j , and let d ij denote the distance between v i and v j (i.e., the sum of weight reciprocals along the shortest path between the two nodes). With this notation in place, we are ready to formally define three centrality measure:

Supplementary
• Degree centrality [7]-the importance of a node v i is determined based on the weights of the edges incident with v i . That is: • Closeness centrality [8]-the importance of a node v i is determined based on the distance from v i to all other nodes (if there is no path from v i to v j we assume d ij = ∞). More formally: • PageRank centrality [9]-the importance of a node v i is determined based on the impotance of its neighbors. The PageRank centrality is computed by an iterative process, where the centrality of each node v i in the first round is c 1 page (v i ) = 1 |V | . In a subsequent round, t, the centrality of node v i is computed as: where γ = 0.85 is the damping factor. We continue the computation until one of the following two conditions is satisfied: (1) the computation lasts for 1,000,000 iterations; (2) the difference in the centrality sum of all nodes between does not exceed 10 −5 for 1,000 consecutive iterations.
In additional to the above three centrality measures, we also consider two alternative influence measures. Both measures are based on the idea that influence may propagate through a network by "node activation". The basic idea is as that when a certain node is sufficiently influenced by its neighbours, it becomes "active", in which case it starts influencing its "inactive" neighbours. To initiate this influence propagation process, one of the nodes, called the seed node, is activated from the start. Assuming that time is discrete, we denote by I t ⊆ V the set of nodes that are active at round t, implying that I 1 is the set consisting of the seed node. The way influence propagates to inactive nodes depends on the influence model under consideration. In this context, two widely-used models are: • Independent cascade [10]-in every round t > 1, every node v i ∈ V that became active in round t − 1 activates every inactive successor, v j ∈ S(v i ) \ I t−1 , with probability: where q = 0.25 is the basic activation probability, and w * is the maximal edge weight in the network. The process ends when there are no newly active nodes, i.e., when I t = I t−1 .
• Linear threshold [11]-every node v i ∈ V is assigned a threshold value θ i which is uniformly sampled from the [0, 1] interval. Then, in every round t > 1, every inactive node v i becomes active, i.e., becomes a member of I t , if the total weight of edges from their active predecessors divided by the total weight of all incoming edges meets or exceeds the threshold, i.e., if: The process ends when there are no newly active nodes, i.e., when I t = I t−1 .
In either model, the influence of a node v i is defined as the expected number of active nodes when starting with v i as the seed node. Our results are presented in Tables 4 to 8. As can be seen, regardless of the network under consideration (be it the citation, the collaboration, or the migration network) and regardless of the measure used (be it degree, closeness, PageRank, independent cascade-based influence, or linear threshold-based influence) Beijing is the highest ranked city worldwide.  Table 6: City ranking of top cities according to PageRank centrality in the citation, migration, and collaboration networks. The color indicates whether the city is east (red) or west (blue).